Recurrent neural networks (RNNs) have shown promising performance forlanguage modeling. However, traditional training of RNNs using back-propagationthrough time often suffers from overfitting. One reason for this is thatstochastic optimization (used for large training sets) does not provide goodestimates of model uncertainty. This paper leverages recent advances instochastic gradient Markov Chain Monte Carlo (also appropriate for largetraining sets) to learn weight uncertainty in RNNs. It yields a principledBayesian learning algorithm, adding gradient noise during training (enhancingexploration of the model-parameter space) and model averaging when testing.Extensive experiments on various RNN models and across a broad range ofapplications demonstrate the superiority of the proposed approach overstochastic optimization.
展开▼